AITopics | attribute-steered detection

Collaborating Authors

attribute-steered detection

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples

Neural Information Processing SystemsNov-20-2025, 22:53:26 GMT

Adversarial sample attacks perturb benign inputs to induce DNN misbehaviors. Recent research has demonstrated the widespread presence and the devastating consequences of such attacks. Existing defense techniques either assume prior knowledge of specific attacks or may not work well on complex models due to their underlying assumptions. We argue that adversarial sample attacks are deeply entangled with interpretability of DNN models: while classification results on benign inputs can be reasoned based on the human perceptible features/attributes, results on adversarial samples can hardly be explained. Therefore, we propose a novel adversarial sample detection technique for face recognition models, based on interpretability. It features a novel bi-directional correspondence inference between attributes and internal neurons to identify neurons critical for individual attributes.

attack meet interpretability, attribute-steered detection, name change, (7 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.77)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.63)

Add feedback

Reviews: Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples

Neural Information Processing SystemsOct-8-2024, 01:53:32 GMT

In this paper the authors examine the intuition that interpretability to be the workhorse in detecting adversarial examples of different kinds. That is, if the humanly interpretable attributes are all the same for two images, then the prediction result should only be different if some non-interpretable neurons behave differently. Other than adversarial examples, this work is also highly related to interpretability and explainability questions for DNNs. The basis of their detection mechanism (AmI) lies in determining the sets of neurons (they call attribute witnesses) that are correspond (one-to-one) to a humanly interpretable attributes (like eyeglasses). That means, if the attribute does not change, the neuron should not give a different output, and the other way around if the feature changes, the neuron should change.

attack meet interpretability, attribute-steered detection, neuron, (11 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.55)

Add feedback

Attacks Meet Interpretability: Attribute-steered Detection of Adversarial Samples

Tao, Guanhong, Ma, Shiqing, Liu, Yingqi, Zhang, Xiangyu

Neural Information Processing SystemsFeb-14-2020, 19:56:58 GMT

adversarial sample, attack meet interpretability, attribute-steered detection, (5 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.80)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback